Lesson 7: Files, Databases, and Pickles

Persistence

So far, we have learned how to write programs and communicate our intentions to the Central Processing Unit using conditional execution, functions, and iterations. We have learned how to create and use data structures in the Main Memory. The CPU and memory are where our software works and runs. It is where all of the "thinking" happens.

But once the power is turned off, anything stored in either the CPU or main memory is erased. So up to now, our programs have just been transient fun exercises to learn Python.

In this lesson, we start to work with Secondary Memory. Secondary memory is not erased even when the power is turned off. Or in the case of a USB flash drive, the data we write from our programs can be removed from the system and transported to another system.

These programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off.

Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network.

One of the simplest ways for programs to maintain their data is by reading and writing text files.

An alternative is to store the state of the program in a database. In this lesson I will present a simple database and a module, pickle, that makes it easy to store program data.

We will primarily focus on reading and writing text files such as those we create in a text editor.

Files

A text file is a sequence of characters stored on a permanent medium like a hard drive, flash memory, or CD - ROM.

First Thing's First

For the examples in this lesson we need few files.

The first one is called words.txt and it is a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games.  This is part of the Moby lexicon project (see http://wikipedia.org/wiki/Moby_Project).

The second one is a list of emails from an open source coding project, called mbox.txt.

The third is from Act 2, Scene 2 of Romeo and Juliet, called romeo-full.txt.

You can download them here:

For ease, you will want to save these files in the same folder that you are in when you start Python.  To find this folder, open IDLE and then go to File > Save As

screenshot

The default folder which displays should be where you save this file.

screenshot

In the above example, I want to save the files in the \AppData\Local\Programs\Python\Python35-32\ folder.

Opening Files

When we want to read or write a file, we first must open the file. Opening the file communicates with your operating system, which knows where the data for each file is stored. When you open a file, you are asking the operating system to find the file by name and make sure the file exists.

This file is in plain text, so you can open it with a text editor, but you can also read it from Python. The built - in function open takes the name of the file as a parameter and returns a file object you can use to read the file.

fin = open('words.txt')

fin is a common name for a file object used for input.

If we display the value of fin, we get this:

Code Output
fin = open('words.txt')
print(fin)
<_io.TextIOWrapper name='words.txt' mode='r' encoding='cp1252'>

If the open is successful, the operating system returns us a file handle. The file handle is not the actual data contained in the file, but instead it is a "handle" that we can use to read the data. You are given a handle if the requested file exists and you have the proper permissions to read the file.

Launch Exercise

If the file does not exist, open will fail with a traceback and you will not get a handle to access the contents of the file:

Code Output
fin = open('stuff.txt')
print(fin)
FileNotFoundError: [Errno 2] No such file or directory: stuff.txt'

Later we will use try and except to deal more gracefully with the situation where we attempt to open a file that does not exist.